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Abstract 

We  address  the  problem  of  part-of-speech  tag¬ 
ging  for  English  data  from  the  popular  micro¬ 
blogging  service  Twitter.  We  develop  a  tagset, 
annotate  data,  develop  features,  and  report 
tagging  results  nearing  90%  accuracy.  The 
data  and  tools  have  been  made  available  to  the 
research  community  with  the  goal  of  enabling 
richer  text  analysis  of  Twitter  and  related  so¬ 
cial  media  data  sets. 

1  Introduction 

The  growing  popularity  of  social  media  and  user- 
created  web  content  is  producing  enormous  quanti¬ 
ties  of  text  in  electronic  form.  The  popular  micro¬ 
blogging  service  Twitter  (twitter.com)  is  one 
particularly  fruitful  source  of  user-created  content, 
and  a  flurry  of  recent  research  has  aimed  to  under¬ 
stand  and  exploit  these  data  (Ritter  et  al.,  2010;  Shar- 
ifi  et  al.,  2010;  Barbosa  and  Feng,  2010;  Asur  and 
Fluberman,  2010;  O’Connor  et  al.,  2010a;  Thelwall 
et  al.,  201 1).  However,  the  bulk  of  this  work  eschews 
the  standard  pipeline  of  tools  which  might  enable 
a  richer  linguistic  analysis;  such  tools  are  typically 
trained  on  newstext  and  have  been  shown  to  perform 
poorly  on  Twitter  (Finin  et  al.,  2010). 

One  of  the  most  fundamental  parts  of  the  linguis¬ 
tic  pipeline  is  part-of-speech  (POS)  tagging,  a  basic 
form  of  syntactic  analysis  which  has  countless  appli¬ 
cations  in  NLP.  Most  POS  taggers  are  trained  from 
treebanks  in  the  newswire  domain,  such  as  the  Wall 
Street  Journal  corpus  of  the  Penn  Treebank  (PTB; 
Marcus  et  al.,  1993).  Tagging  performance  degrades 
on  out-of-domain  data,  and  Twitter  poses  additional 
challenges  due  to  the  conversational  nature  of  the 
text,  the  lack  of  conventional  orthography,  and  140- 
character  limit  of  each  message  (“tweet”).  Figure  1 
shows  three  tweets  which  illustrate  these  challenges. 


(a)  @Gunservatively@  obozo/\  will v  gov  nutsA 
whenR  PAa  electsy  aD  Republican  A  Governor^ 
nextp  TueA  .,  Cany  you0  sayv  redistrictingy  ?, 

(b)  Spendingy  theD  dayN  withhhp  mommmaN  !, 

(c)  ImaOj  ...,  s/oy  top  theD  coolA  assN  asianA 
officerN  4p  #1,-K  notR  runniny  myD  licenseN  and& 
#2<u  notR  takiny  druN  booN  top  jailN  .,  Thanky 
u  GodA  .,  #amen# 

Figure  1 :  Example  tweets  with  gold  annotations.  Under¬ 
lined  tokens  show  tagger  improvements  due  to  features 
detailed  in  Section  3  (respectively:  TagDict,  Metaph, 
and  DistSim). 

In  this  paper,  we  produce  an  English  POS  tagger 
that  is  designed  especially  for  Twitter  data.  Our  con¬ 
tributions  are  as  follows: 

•  we  developed  a  POS  tagset  for  Twitter, 

•  we  manually  tagged  1,827  tweets, 

•  we  developed  features  for  Twitter  POS  tagging 
and  conducted  experiments  to  evaluate  them,  and 

•  we  provide  our  annotated  corpus  and  trained  POS 
tagger  to  the  research  community. 

Beyond  these  specific  contributions,  we  see  this 
work  as  a  case  study  in  how  to  rapidly  engi¬ 
neer  a  core  NLP  system  for  a  new  and  idiosyn¬ 
cratic  dataset.  This  project  was  accomplished  in 
200  person-hours  spread  across  17  people  and  two 
months.  This  was  made  possible  by  two  things: 
(1)  an  annotation  scheme  that  fits  the  unique  char¬ 
acteristics  of  our  data  and  provides  an  appropriate 
level  of  linguistic  detail,  and  (2)  a  feature  set  that 
captures  Twitter-specific  properties  and  exploits  ex¬ 
isting  resources  such  as  tag  dictionaries  and  phonetic 
normalization.  The  success  of  this  approach  demon¬ 
strates  that  with  careful  design,  supervised  machine 
learning  can  be  applied  to  rapidly  produce  effective 
language  technology  in  new  domains. 


Tag  Description 

Examples 

% 

1  Nominal,  Nominal  +  Verbal  j 

N  common  noun  (NN,  NNS) 

books  someone 

13.7 

O  pronoun  (personal/WH;  not 

it  you  u  meeee 

6.8 

possessive;  PRP,  WP) 

S  nominal  +  possessive 

books’  someone’s 

0.1 

proper  noun  (NNP ,  NNP  S ) 

lebron  usa  iPad 

6.4 

Z  proper  noun  +  possessive 

America's 

0.2 

L  nominal  +  verbal 

he's  book’ll  iono 

(=  I  don ’t  know ) 

1.6 

M  proper  noun  +  verbal 

Mark’ll 

0.0 

1  Other  open-class  words  1 

V  verb  incl.  copula, 

might  gonna 

15.1 

auxiliaries  (V*,  MD) 

ought  couldn’t  is 
eats 

A  adjective  (J*) 

good  fav  IN 

5.1 

R  adverb  (R*,  WRB) 

2  (i.e.,  too) 

4.6 

!  interjection  (UH) 

lol  haha  FTW  yea 
right 

2.6 

1  Other  closed-class  words  j 

D  determiner  (WDT,  DT, 

the  teh  its  it’s 

6.5 

WP$,  PRP $) 

P  pre-  or  postposition,  or 

while  to  for  2  (i.e.. 

8.7 

subordinating  conjunction 

(IN,  TO) 

to)  4  (i.e.,  for) 

&  coordinating  conjunction 

and  n  &  +  BUT 

1.7 

(CC) 

T  verb  particle  (RP) 

out  off  Up  UP 

0.6 

X  existential  there. 

both 

0.1 

predeterminers  (EX,  PDT) 

Y  X  +  verbal 

there’s  all's 

0.0 

1  Twitter/online-specific  1 

#  hashtag  (indicates 

#acl 

1.0 

topic/category  for  tweet) 

@  at-mention  (indicates 

@BarackObama 

4.9 

another  user  as  a  recipient 
of  a  tweet) 

~  discourse  marker. 

RT  and  :  in  retweet 

3.4 

indications  of  continuation 

construction  RT 

of  a  message  across 
multiple  tweets 

@user :  hello 

U  URL  or  email  address 

http://bit.ly/xyz 

1.6 

E  emoticon 

:-)  :b  (:  <3  o__0 

1.0 

1  Miscellaneous  | 

$  numeral  (CD) 

2010  four  9:30 

1.5 

,  punctuation  (#,  ,  (, 

") 

!!!  ....  ?!? 

11.6 

G  other  abbreviations,  foreign 

ily  (/  love  you )  wby 

1.1 

words,  possessive  endings. 

(what  about  you)  ’s 

symbols,  garbage  (FW, 

■9  -> 

POS,  SYM,  LS) 

awesome. ..I’m 

Table  1:  The  set  of  tags  used  to  annotate  tweets.  The 
last  column  indicates  each  tag’s  relative  frequency  in  the 
full  annotated  data  (26,435  tokens).  (The  rates  for  M  and 
Y  are  both  <  0.0005.) 


2  Annotation 

Annotation  proceeded  in  three  stages.  For  Stage  0, 
we  developed  a  set  of  20  coarse-grained  tags  based 
on  several  treebanks  but  with  some  additional  cate¬ 
gories  specific  to  Twitter,  including  URLs  and  hash- 
tags.  Next,  we  obtained  a  random  sample  of  mostly 
American  English1  tweets  from  October  27,  2010, 
automatically  tokenized  them  using  a  Twitter  tok- 
enizer  (O’Connor  et  al.,  2010b),2  and  pre-tagged 
them  using  the  WSJ-trained  Stanford  POS  Tagger 
(Toutanova  et  al.,  2003)  in  order  to  speed  up  man¬ 
ual  annotation.  Flemishes  were  used  to  mark  tokens 
belonging  to  special  Twitter  categories,  which  took 
precedence  over  the  Stanford  tags. 

Stage  1  was  a  round  of  manual  annotation:  17  re¬ 
searchers  corrected  the  automatic  predictions  from 
Stage  0  via  a  custom  Web  interface.  A  total  of 
2,217  tweets  were  distributed  to  the  annotators  in 
this  stage;  390  were  identified  as  non-English  and 
removed,  leaving  1,827  annotated  tweets  (26,436  to¬ 
kens). 

The  annotation  process  uncovered  several  situa¬ 
tions  for  which  our  tagset,  annotation  guidelines, 
and  tokenization  rules  were  deficient  or  ambiguous. 
Based  on  these  considerations  we  revised  the  tok¬ 
enization  and  tagging  guidelines,  and  for  Stage  2, 
two  annotators  reviewed  and  corrected  all  of  the 
English  tweets  tagged  in  Stage  1.  A  third  anno¬ 
tator  read  the  annotation  guidelines  and  annotated 
72  tweets  from  scratch,  for  purposes  of  estimating 
inter-annotator  agreement.  The  72  tweets  comprised 
1,021  tagged  tokens,  of  which  80  differed  from  the 
Stage  2  annotations,  resulting  in  an  agreement  rate 
of  92.2%  and  Cohen’s  k  value  of  0.914.  A  final 
sweep  was  made  by  a  single  annotator  to  correct  er¬ 
rors  and  improve  consistency  of  tagging  decisions 
across  the  corpus.  The  released  data  and  tools  use 
the  output  of  this  final  stage. 

2.1  Tagset 

We  set  out  to  develop  a  POS  inventory  for  Twitter 
that  would  be  intuitive  and  informative — while  at 
the  same  time  simple  to  learn  and  apply — so  as  to 
maximize  tagging  consistency  within  and  across  an- 

*We  filtered  to  tweets  sent  via  an  English-localized  user  in¬ 
terface  set  to  a  United  States  timezone. 

2http : / / git hub . com/brendano/tweetmotif 


notators.  Thus,  we  sought  to  design  a  coarse  tagset 
that  would  capture  standard  pa  its  of  speech1  (noun, 
verb,  etc.)  as  well  as  categories  for  token  varieties 
seen  mainly  in  social  media:  URLs  and  email  ad¬ 
dresses;  emoticons;  Twitter  hash  tags,  of  the  form 
#tagname,  which  the  author  may  supply  to  catego¬ 
rize  a  tweet;  and  Twitter  at-mentions,  of  the  form 
@user,  which  link  to  other  Twitter  users  from  within 
a  tweet. 

Hashtags  and  at-mentions  can  also  serve  as  words 
or  phrases  within  a  tweet;  e.g.  Is  #qadaffi  going  down?. 
When  used  in  this  way,  we  tag  hashtags  with  their 
appropriate  paid  of  speech,  i.e.,  as  if  they  did  not  staid 
with  #.  Of  the  418  hashtags  in  our  data,  148  (35%) 
were  given  a  tag  other  than  #:  14%  arc  proper  nouns, 
9%  are  common  nouns,  5%  arc  multi-word  express- 
sions  (tagged  as  G),  3%  are  verbs,  and  4%  arc  some¬ 
thing  else.  We  do  not  apply  this  procedure  to  at- 
mentions,  as  they  arc  nearly  always  proper  nouns. 

Another  tag,  is  used  for  tokens  marking  spe¬ 
cific  Twitter  discourse  functions.  The  most  popular 
of  these  is  the  RT  (“retweet”)  construction  to  publish 
a  message  with  attribution.  For  example, 

RT  @USER1  :  LMBO  !  This  man  filed  an 
EMERGENCY  Motion  for  Continuance  on 
account  of  the  Rangers  game  tonight  !  < 

Wow  I  mao 

indicates  that  the  user  @USER1  was  originally  the 
source  of  the  message  following  the  colon.  We  ap¬ 
ply  ~  to  the  RT  and  :  (which  are  standard),  and 
also  <C,  which  separates  the  author’s  comment  from 
the  retweeted  material.4  Another  common  discourse 
marker  is  ellipsis  dots  (...)  at  the  end  of  a  tweet, 
indicating  a  message  has  been  truncated  to  fit  the 
140-character  limit,  and  will  be  continued  in  a  sub¬ 
sequent  tweet  or  at  a  specified  URL. 

Our  first  round  of  annotation  revealed  that,  due  to 
nonstandard  spelling  conventions,  tokenizing  under 
a  traditional  scheme  would  be  much  more  difficult 

3  Our  starting  point  was  the  cross-lingual  tagset  presented  by 
Petrov  et  al.  (2011).  Most  of  our  tags  are  refinements  of  those 
categories,  which  in  turn  are  groupings  of  PTB  WSJ  tags  (see 
column  2  of  Table  1).  When  faced  with  difficult  tagging  deci¬ 
sions,  we  consulted  the  PTB  and  tried  to  emulate  its  conventions 
as  much  as  possible. 

4These  “iconic  deictics”  have  been  studied  in  other  online 
communities  as  well  (Collister,  2010). 


than  for  Standard  English  text.  For  example,  apos¬ 
trophes  are  often  omitted,  and  there  are  frequently 
words  like  ima  (short  for  I’m  gonna )  that  cut  across 
traditional  POS  categories.  Therefore,  we  opted  not 
to  split  contractions  or  possessives,  as  is  common 
in  English  corpus  preprocessing;  rather,  we  intro¬ 
duced  four  new  tags  for  combined  forms:  {nominal, 
proper  noun}  x  {verb,  possessive}.5 

The  final  tagging  scheme  (Table  1)  encompasses 
25  tags.  For  simplicity,  each  tag  is  denoted  with  a 
single  ASCII  character.  The  miscellaneous  category 
G  includes  multiword  abbreviations  that  do  not  fit 
in  any  of  the  other  categories,  like  ily  (7  love  you),  as 
well  as  partial  words,  artifacts  of  tokenization  errors, 
miscellaneous  symbols,  possessive  endings,6  and  ar¬ 
rows  that  arc  not  used  as  discourse  markers. 

Figure  2  shows  where  tags  in  our  data  tend  to  oc¬ 
cur  relative  to  the  middle  word  of  the  tweet.  We 
see  that  Twitter- specific  tags  have  strong  positional 
preferences:  at-mentions  (@)  and  Twitter  discourse 
markers  (~)  tend  to  occur  towards  the  beginning  of 
messages,  whereas  URLs  (U),  emoticons  (E),  and 
categorizing  hashtags  (#)  tend  to  occur  near  the  end. 

3  System 

Our  tagger  is  a  conditional  random  field  (CRF;  Laf- 
ferty  et  ah,  2001),  enabling  the  incorporation  of  ar¬ 
bitrary  local  features  in  a  log-linear  model.  Our 
base  features  include:  a  feature  for  each  word  type, 
a  set  of  features  that  check  whether  the  word  con¬ 
tains  digits  or  hyphens,  suffix  features  up  to  length  3, 
and  features  looking  at  capitalization  patterns  in  the 
word.  We  then  added  features  that  leverage  domain- 
specific  properties  of  our  data,  unlabeled  in-domain 
data,  and  external  linguistic  resources. 

TwOrth:  Twitter  orthography.  We  have  features 
for  several  regular  expression-style  rules  that  detect 
at-mentions,  hashtags,  and  URLs. 

Names:  Frequently-capitalized  tokens.  Micro¬ 
bloggers  arc  inconsistent  in  their  use  of  capitaliza¬ 
tion,  so  we  compiled  gazetteers  of  tokens  which  arc 
frequently  capitalized.  The  likelihood  of  capital¬ 
ization  for  a  token  is  computed  as  ,  where 

3  The  modified  tokenizer  is  packaged  with  our  tagger. 

6Possessive  endings  only  appear  when  a  user  or  the  tok¬ 
enizer  has  separated  the  possessive  ending  from  a  possessor;  the 
tokenizer  only  does  this  when  the  possessor  is  an  at-mention. 


L 


E# 


U 


-7  -S  -3  -10  1  3  S  7 

Figure  2:  Average  position,  relative  to  the  middle  word  in  the  tweet,  of  tokens  labeled  with  each  tag.  Most  tags  fall 
between  —1  and  1  on  this  scale;  these  are  not  shown. 


N  is  the  token  count,  iVcap  is  the  capitalized  to¬ 
ken  count,  and  a  and  C  are  the  prior  probability 
and  its  prior  weight.* * * * 7  We  compute  features  for 
membership  in  the  top  N  items  by  this  metric,  for 
N  g  {1000, 2000, 3000,  5000, 10000,  20000}. 
TagDict:  Traditional  tag  dictionary.  We  add 
features  for  all  coarse-grained  tags  that  each  word 
occurs  with  in  the  PTB8  (conjoined  with  their  fre¬ 
quency  rank).  Unlike  previous  work  that  uses  tag 
dictionaries  as  hard  constraints,  we  use  them  as  soft 
constraints  since  we  expect  lexical  coverage  to  be 
poor  and  the  Twitter  dialect  of  English  to  vary  sig¬ 
nificantly  from  the  PTB  domains.  This  feature  may 
be  seen  as  a  form  of  type-level  domain  adaptation. 
DistSim:  Distributional  similarity.  When  train¬ 
ing  data  is  limited,  distributional  features  from  un¬ 
labeled  text  can  improve  performance  (Schiitze  and 
Pedersen,  1993).  We  used  1.9  million  tokens  from 
134,000  unlabeled  tweets  to  construct  distributional 
features  from  the  successor  and  predecessor  proba¬ 
bilities  for  the  10,000  most  common  terms.  The  suc¬ 
cessor  and  predecessor  transition  matrices  are  hori¬ 
zontally  concatenated  into  a  sparse  matrix  M,  which 
we  approximate  using  a  truncated  singular  value  de¬ 
composition:  M  ~  USV  ,  where  U  is  limited  to 
50  columns.  Each  term’s  feature  vector  is  its  row 
in  U;  following  Turian  et  al.  (2010),  we  standardize 
and  scale  the  standard  deviation  to  0.1. 

Metaph:  Phonetic  normalization.  Since  Twitter 
includes  many  alternate  spellings  of  words,  we  used 
the  Metaphone  algorithm  (Philips,  1990)9  to  create 
a  coarse  phonetic  normalization  of  words  to  simpler 
keys.  Metaphone  consists  of  19  rules  that  rewrite 
consonants  and  delete  vowels.  For  example,  in  our 

1  a  =  ^ ,  C  =  10;  this  score  is  equivalent  to  the  posterior 

probability  of  capitalization  with  a  Beta(0.1,  9.9)  prior. 

sBoth  WSJ  and  Brown  corpora,  no  case  normalization.  We 

also  tried  adding  the  WordNet  (Fellbaum,  1998)  and  Moby 

(Ward,  1996)  lexicons,  which  increased  lexical  coverage  but  did 

not  seem  to  help  performance. 

9Via  the  Apache  Commons  implementation:  http:// 

commons . apache . org/codec/ 


data,  {thangs  thanks  thanksss  thanx  thinks  thnx} 
are  mapped  to  ONKS,  and  {Imao  Imaoo  Imaooooo} 
map  to  LM.  But  it  is  often  too  coarse;  e.g.  {war  we’re 
wear  were  where  worry}  map  to  WR. 

We  include  two  types  of  features.  First,  we  use 
the  Metaphone  key  for  the  current  token,  comple¬ 
menting  the  base  model’s  word  features.  Second, 
we  use  a  feature  indicating  whether  a  tag  is  the  most 
frequent  tag  for  PTB  words  having  the  same  Meta¬ 
phone  key  as  the  current  token.  (The  second  feature 
was  disabled  in  both  —TagDict  and  —Metaph  ab¬ 
lation  experiments.) 

4  Experiments 

Our  evaluation  was  designed  to  test  the  efficacy  of 
this  feature  set  for  part-of-speech  tagging  given  lim¬ 
ited  training  data.  We  randomly  divided  the  set  of 
1,827  annotated  tweets  into  a  training  set  of  1,000 
(14,542  tokens),  a  development  set  of  327  (4,770  to¬ 
kens),  and  a  test  set  of  500  (7,124  tokens).  We  com¬ 
pare  our  system  against  the  Stanford  tagger.  Due 
to  the  different  tagsets,  we  could  not  apply  the  pre¬ 
trained  Stanford  tagger  to  our  data.  Instead,  we  re¬ 
trained  it  on  our  labeled  data,  using  a  standard  set 
of  features:  words  within  a  5 -word  window,  word 
shapes  in  a  3-word  window,  and  up  to  length-3 
prefixes,  length-3  suffixes,  and  prefix/suffix  pairs.10 
The  Stanford  system  was  regularized  using  a  Gaus¬ 
sian  prior  of  a2  =  0.5  and  our  system  with  a  Gaus¬ 
sian  prior  of  a2  =  5.0,  tuned  on  development  data. 

The  results  are  shown  in  Table  2.  Our  tagger  with 
the  full  feature  set  achieves  a  relative  error  reduction 
of  25%  compared  to  the  Stanford  tagger.  We  also 
show  feature  ablation  experiments,  each  of  which 
corresponds  to  removing  one  category  of  features 
from  the  full  set.  In  Figure  1,  we  show  examples 
that  certain  features  help  solve.  Underlined  tokens 

10We  used  the  following  feature  modules  in  the  Stanford  tag¬ 
ger:  bidirectional5words,  naacl2003unknowns, 
wordshapes  (-3, 3) ,  prefix(3),  suffix(3), 
prefixsuffix (3)  . 


Dev. 

Test 

Our  tagger,  all  features 

88.67 

89.37 

independent  ablations: 
-DistSim 

87.88 

88.31 

(-1.06) 

-TagDict 

88.28 

88.31 

(-1.06) 

-TwOrth 

87.51 

88.37 

(-1.00) 

-Metaph 

88.18 

88.95 

(-0.42) 

-Names 

88.66 

89.39 

(+0.02) 

Our  tagger,  base  features 

82.72 

83.38 

Stanford  tagger 

85.56 

85.85 

Annotator  agreement 

92.2 

Table  2:  Tagging  accuracies  on  development  and  test 
data,  including  ablation  experiments.  Features  are  or¬ 
dered  by  importance:  test  accuracy  decrease  due  to  ab¬ 
lation  (final  column). 
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Table  3: 

Accuracy 

(recall) 

rates  per  class,  in  the  test  set 

with  the  full  model. 

(Omitting  tags  that 

occur  less  than 

10  times  in  the  test  set.)  For  each  gold  category,  the  most 
common  confusion  is  shown. 

are  incorrect  in  a  specific  ablation,  but  are  corrected 
in  the  full  system  (i.e.  when  the  feature  is  added). 

The  — TagDict  ablation  gets  elects,  Governor, 
and  next  wrong  in  tweet  (a).  These  words  appear 
in  the  PTB  tag  dictionary  with  the  correct  tags,  and 
thus  are  fixed  by  that  feature.  In  (b),  withhh  is  ini¬ 
tially  misclassified  an  interjection  (likely  caused  by 
interjections  with  the  same  suffix,  like  ohhh),  but  is 
corrected  by  Metaph,  because  it  is  normalized  to  the 
same  equivalence  class  as  with.  Finally,  s/o  in  tweet 
(c)  means  “shoutout”,  which  appears  only  once  in 
the  training  data;  adding  DistSim  causes  it  to  be  cor¬ 
rectly  identified  as  a  verb. 

Substantial  challenges  remain;  for  example,  de¬ 
spite  the  Names  feature,  the  system  struggles  to 
identify  proper  nouns  with  nonstandard  capitaliza¬ 
tion.  This  can  be  observed  from  Table  3,  which 
shows  the  recall  of  each  tag  type:  the  recall  of  proper 
nouns  O  is  only  71%.  The  system  also  struggles 


with  the  miscellaneous  category  (G),  which  covers 
many  rare  tokens,  including  obscure  symbols  and  ar¬ 
tifacts  of  tokenization  errors.  Nonetheless,  we  arc 
encouraged  by  the  success  of  our  system  on  the 
whole,  leveraging  out-of-domain  lexical  resources 
(TagDict),  in-domain  lexical  resources  (DistSim), 
and  sublexical  analysis  (Metaph). 

Finally,  we  note  that,  even  though  1,000  train¬ 
ing  examples  may  seem  small,  the  test  set  accuracy 
when  training  on  only  500  tweets  drops  to  87.66%, 
a  decrease  of  only  1.7%  absolute. 

5  Conclusion 

We  have  developed  a  part-of-speech  tagger  for  Twit¬ 
ter  and  have  made  our  data  and  tools  available  to  the 
research  community  at  http://www.ark.es. 
cmu.edu/TweetNLP.  More  generally,  we  be¬ 
lieve  that  our  approach  can  be  applied  to  address 
other  linguistic  analysis  needs  as  they  continue  to 
arise  in  the  era  of  social  media  and  its  rapidly  chang¬ 
ing  linguistic  conventions.  We  also  believe  that  the 
annotated  data  can  be  useful  for  research  into  do¬ 
main  adaptation  and  semi-supervised  learning. 
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